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Hybrid Optimization Algorithm for Large-Scale 
QoS-Aware Service Composition 
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Abstract —In this paper we present a hybrid approach for automatic composition of Web services that generates semantic input- 
output based compositions with optimal end-to-end QoS, minimizing the number of services of the resulting composition. The proposed 
approach has four main steps: 1) generation of the composition graph for a request; 2) computation of the optimal composition that 
minimizes a single objective QoS function; 3) multi-step optimizations to reduce the search space by identifying equivalent and dominated 
services; and 4) hybrid local-global search to extract the optimal QoS with the minimum number of services. An extensive validation with 
the datasets of the Web Service Challenge 2009-2010 and randomly generated datasets shows that: 1) the combination of local and 
global optimization is a general and powerful technique to extract optimal compositions in diverse scenarios; and 2) the hybrid strategy 
performs better than the state-of-the-art, obtaining solutions with less services and optimal QoS. 
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1 Introduction 

EB services are self-describing software applica¬ 
tions that can be published, discovered and in¬ 
voked accross the Web using standard technologies Q. 
The functionality of a Web service is mainly determined 
by the functional properties that describe their behaviour 
in terms of its inputs, outputs, and also possibly ad¬ 
ditional descriptions that the services may have, such 
as preconditions and effects. These four characteristics, 
commonly abbreviated lOPEs, allow the composition 
and aggregation of Web services into composite Web 
services that achieve more complex functionalities and, 
therefore, solve complex user needs that carmot be sat¬ 
isfied with atomic Web services. However, compositions 
should go beyond achieving a concrete functionality and 
take into account other requirements such as Quality-of- 
Service (QoS) to generate also compositions that fit the 
needs of different contexts. The QoS determines the value 
of different quality properties of services such as re¬ 
sponse time (total time a service takes to respond to a re¬ 
quest) or throughput (number of invocations supported 
in a given time interval), among others characteristics. 
These properties apply both to single services and to 
composite services, where each individual service in the 
composition contributes to the global QoS. For composite 
services this implies that having many different services 
with similar or identical functionality, but different QoS, 
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may lead to a large amount of possible compositions that 
satisfy the same functionality with different QoS but also 
with a different number of services. 

However, the problem of generating automatic compo¬ 
sitions that satisfy a given request with an optimal QoS 
is a very complex task, specially in large-scale environ¬ 
ments, where many service providers offer services with 
similar functionality but with different QoS. This has 
motivated researchers to explore efficient strategies to 
generate QoS-aware Web service compositions from dif¬ 
ferent perspectives 0, 1^. But despite the large number 
of strategies proposed! sorar, the problem of finding auto¬ 
matic compositions that minimize the number of services 
while guaranteeing the optimal end-to-end QoS is rarely 
considered. Instead, most of the work has focused on 
optimizing the global QoS of a composition or improving 
the execution time of the composition engines. An anal¬ 
ysis of the literature shows that only a few works take 
into consideration the number of services of the resulting 
optimal QoS compositions. Some notable examples are 
Q-|0. Although most of these composition engines are 
quite efficient in terms of computation time, none of 
them are able to effectively minimize the total number of 
services of the solution while keeping the optimal QoS. 

The ability to provide not only optimal QoS but also 
an optimal number of services is specially important in 
large-scale scenarios, where the large number of services 
and the possible interactions among them may lead 
to a vast amount of possible solutions with different 
number of services but also with the same optimal QoS 
for a given problem. Moreover, there can be situations 
where certain QoS values are missing or cannot be 
measured. Although theprediction of QoS can partially 
alleviate this problem it is not always possible to 
have historical data in order to build statistical models 
to accurately predict missing QoS. In this context, opti- 
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mizing not only the available QoS but also the number 
of services of the composition may indirectly improve 
other missing properties. This has important benefits 
for brokers, customers and service providers. From the 
broker point of view, the generation of smaller composi¬ 
tions is interesting to achieve manageable compositions 
that are easier to execute, monitor, debug, deploy and 
scale. On the other hand, customers can also benefit from 
smaller compositions, specially when there are multiple 
solutions with the same optimal end-to-end QoS but 
different number of services. This is even more important 
when service providers do not offer fine-grained QoS 
metrics, since decreasing the number of services involved 
in the composition may indirectly improve other quality 
parameters such as communication overhead, risk of 
failure, cormection latency, etc. This is also interesting 
from the perspective of service providers. For example, 
if the customer wants the cheapest composition, the 
solution with fewer services from the same provider may 
also require less resources for the same task. 

However, one of the main difficulties when looking for 
optimal solutions is that it usually requires to explore the 
complete search space among all possible combinations 
of services, which is a hard combinatorial problem. In 
fact, finding the optimal composition with the minimum 
number of services is NP-Hard (see Appendix A). Thus, 
achieving a reasonable trade-off between solution quality 
and execution time in large-scale environments is far 
from trivial, and hardly achievable without adequate 
optimizations. 

In this paper we focus on the automatic generation 
of semantic input-output compositions, minimizing both 
a single QoS criterion and the total number of services 
subject to the optimal QoS. The main contributions are: 

• A multi-step optimization pipeline based on the 
analysis of non-relevant, equivalent and dominated 
services in terms of interface functionality and QoS. 

• A fast local search strategy that guarantees to 
obtain a near-optimal number of services while 
satisfying the optimal end-to-end QoS for an input- 
output based composition request. 

• An optimal combinatorial search that can improve 
the solution obtained with the local search strategy 
by performing an exhaustive combinatorial search 
to select the composition with the minimum num¬ 
ber of services for the optimal QoS. 

We tested our proposal using the Web Service Chal¬ 
lenge 2009-2010 datasets and, also, a different randomly 
generated dataset with a variable number of services. 
The rest of the paper is organized as follows: Sec. 
introduces the composition problem. Sec. [^describes the 
proposed approach. Sec. presents the results obtained, 
and Sec. gives some final remarks. 

2 Related Work 

Automatic composition of services is a fundamental and 
complex problem in the field of Service Qriented Com¬ 


puting, which has been approached from many different 
perspectives depending on what kinds of assumptions 
are made 0, 0, 0, |fe]. AI Plarming techniques have 
been tradition^y usecTm service composition to gener¬ 
ate valid composition plans by mapping services to ac¬ 
tions in the plarming domain [[lT]|-|l6|. These techniques 
work under the assumption mat services are complex 
operators that are well defined in terms of IQPEs, so the 
problem can be translated to a plarming problem and 
solved using classical plarming algorithms. Most of these 
approaches have been mainl y fo cused on exploiting se¬ 
mantic techniques @,116), py) and developing heuris¬ 
tics p5) , p6) , p8| to improve the performance of the 
planners. As a result, and partly given by the complexity 
of generating satisfiable plans in the plarming domain, 
these approaches do not generate neither optimal plans 
(minimizing the number of actions) nor optimal QoS- 
aware compositions. 

Qther approaches have studied the QoS-aware com¬ 
position problem from the perspective of Qperation 
Research, providing interesting strategies for optimal 
selection of services and optimizing the global QoS of 
the composition subject to multiple QoS constraints. A 
common strategy is to reduce the composition problem 
to a combinatorial Knapsack-based problem, which is 
generally solved using constraint satisfaction algorithms 
(such as Integer Prog ramming) or Evolutionary 

Algorithms pM, ]25) . Some relevant approaches are |[T^, 
p^ . In p9) the authors present AgElow, a QoS middle¬ 
ware for service composition. They analyze two different 
methods for QoS optimization, a local selection and a 
global selection strategy. The second strategy is able to 
optimize the global end-to-end QoS of the composition 
using a Integer Linear Programming method, which per¬ 
forms better than the suboptimal local selection strategy. 
Similarly, in the authors propose a hybrid QoS 
selection approach that combines a global optimization 
strategy with local selection for large-scale QoS compo¬ 
sition. The assumption made by all these approaches is 
that there is only one composition workflow with a fixed 
set of abstract tasks, where each abstract task can be 
implemented by a concrete service. Both the composition 
workflow and the service candidates for each abstract 
task are assumed to be prefined beforehand, so these 
techniques are not able to produce compositions with 
variable size. 

A different category of techniques are graph-based 
approaches that 1) generate the entire composition by se¬ 
lecting and combining relevant services and 2) optimize 
the global QoS of the composition. These techniques 
usually combine variants or new ideas inspired by dif¬ 
ferent fields, such as AI Planning, Qperations Research 
or Heuristic Search, in order to resolve more efficiently 
the automatic QoS composition, usually for a single QoS 
criterion. Some relevant approaches in this category are 
the top-3 wirmers of the Web Service Challenge (WSC) 
2009-2010 Concretely, the wirmers of the WSC 

challenge W, presented an approach that automatically 
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discovers and composes services, optimizing the global 
QoS. This approach also includes an optimization phase 
to reduce the number of services of fhe solufion. Al- 
fhough fhe proposed algorifhm has in general good 
performance, as demonsfrafed in fhe WSC, if cannof 
guaranfee fo obfain opfimal solufions in ferms of number 
of services. The ofher parficipanfs of fhe WSC have also 
fhe same limifafion. 

A recenf and inferesfing approach in fhis cafegory has 
been recenfly presenfed by Jiang ef al. p6) . In fhis paper, 
fhe aufhors analyze fhe problem of generafing fop K 
query composifions by relaxing fhe opfimalify of fhe QoS 
in order fo infroduce service variahilify. However, fhe 
composifions are generated at the expense of worsening 
fhe opfimal QoS, insfead of looking firsf for all possible 
composifion alfernafives wifh fhe minimum number of 
services fhaf guaranfee fhe opfimal QoS. 

Anofher inferesfing graph-based approach has been 
presenfed in |0. In fhis paper, fhe aufhors propose a 
service removal sfrafegy fhaf defecfs services fhat are re- 
dundanf in ferms of funcfionalify and QoS. Resulfs show 
fhat service removal techniques can be very effective to 
reduce the number of services before exfracfing fhe final 
composifion, as anficipafed by ofher similar approaches 
[[27[-| 291. However, some imporfanf limifafions of fhis 
work are: 1) The QoS is nof always opfimal, since fhe 
graph generafed for fhe composifion is nof complefe as 
if does nof confain all fhe relafions befween services (if 
is acyclic) and 2) alfhough fhe redundancy removal is 
an effecfive fechnique fhaf can be used also fo prune fhe 
search space, fhis sfrafegy ifself carmof provide opfimal 
resulfs in ferms of number of services, and it should 
be combined with exhaustive search to improve the 
solutions obtained. 

In summary, despite the large number of approaches 
for aufomatic QoS-aware service composifion fhere is 
a lack of efficienf fechniques fhaf are nof only able fo 
opfimize fhe global end-fo-end QoS, buf also effectively 
minimize fhe number of services of fhe composifion. This 
paper aims fo provide an efficienf graph-based approach 
fhaf uses a hybrid local-global optimization algorithm 
in order to find opfimal composifions bofh in ferms of 
single QoS criferia and in ferms of minimum number of 
services. 


A mofivafing example of fhe problem is shown in 
FigE The figure represenfs a graph wifh all fhe rel- 
evanf services for a requesf R where fhe inpufs are 
{ont3:IPAddress,ont2:MerchantCode} and fhe oufpuf is 
{xsd:boolean}. The goal of fhis example is fo obtain a 
composition to predict whether a business transaction is 
fraudulent or not. Each service (associated to a response 
time QoS) is represented by squares. Inputs and outputs 
are represented by circles. The graph also contains edges 
cormecting outputs and inputs. These edges represent 
valid semantic matches whenever an output of a service 
can be passed as an inpuf of a differenf service. As can be 
seen, fhere are some inpufs {ontl:Location,ont3:P ay merit) 
fhaf can be mafched by more fhan one oufpuf, so fhere 
are many differenf ways fo combine services fo achieve 
fhe same goal. 

Alfhough finding fhe proper combinafion of services 
in terms of fheir inpufs/oufpufs is essenfial fo generafe 
a solufion, it is not enough to obtain good composi¬ 
tions, since there can exist different combinations of 
services wifh differenf QoS. Moreover, many differenf 
combinafions of services may produce composifions wifh 
a different number of services buf fhe same end-fo- 
end QoS. For example, in Fig. we can selecf WS E- 
Payment service or fhe Secure Payment service fo process 
fhe elecfronic paymenf. However, fhe second service has 
a higher response fime. Using fhis leads fo a sub-opfimal 
end-fo-end QoS of 420 ms. However, fhere are ofher 
sifuafions where fhe selecfion of differenf services leads 
fo composifions wifh differenf size buf same end-fo-end 
QoS. For example, bofh Free Geoloc Service or fhe Premium 
Geoloc Service can be selecfed fo franslafe an IP fo a 
Location. Alfhough fhe second one has a better average 
response fime (40 ms), if requires an additional service fo 
obfain fhe ClientID for verification purposes. However, 
selecting fhe Premium Geoloc Service or fhe Free Geoloc 
Service does nof have an impacf on fhe global QoS, since 
fhe ML Predictor Service has fo waif longer fo obfain fhe 
Transaction paramefer (200 ms), buf if has an impacf on 
fhe fofal number of services of fhe solufion. 

The goal of fhis paper is fo automatically generafe, 
given a composifion requesf, a graph like fhe one rep¬ 
resented in Fig. as well as fo exfracf fhe opfimal end- 
fo-end QoS composifion wifh fhe minimum number of 
services from fhaf graph. 


3 Motivation 

The aim of fhe aufomatic service composifion problem, 
as considered in fhis paper, is fo aufomafically selecf fhe 
besf combinafion of available QoS-aware services in a 
way fhat can fulfil a user requesf fhaf ofherwise could nof 
be solved by jusf invoking a single, existing service. This 
requesf is specified in ferms of fhe information fhaf fhe 
user provides (inpufs), and fhe information if expecfs fo 
obfain (oufpufs). The resulfing composifion should meef 
fhis request with an optimal, single criterion end-to-end 
QoS and using as less services as possible. 


4 Problem Formulation 

We herein formalize the main concepts and assumptions 
regarding the composition model used in our approach, 
which consists of a semantic, graph-centric represen¬ 
tation of the service composition. These concepts are 
captured in three main models: 1) a service model, which 
is used to represent services and define how services can 
be cormected or matched to generate composite services; 
2) a graph-based composition model, which is used to 
represent both service interactions and compositions; 
and 3) a QoS computation model, which provides the 
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Fig. 1. Example of a Service Match Graph for a request R = {{ont3:IPAcldress, ont2:MerchantCode), {xsd:boolean}} 
to predict whether a business transaction is fraudulent or not. Each service is associated with an average response time. 
The optimal solution {Service Composition Graph), with an overall response time of 410 ms and 4 services (excluding 
So and Si) is highlighted. 


operators required to compute the global QoS in a graph- 
based composition. 

4.1 Semantic Service Modei 

The automatic composition of services requires a mecha¬ 
nism to select appropiated services based on their func¬ 
tional descriptions, as well as to automatic match the 
services together by linking their inputs and outputs 
to generate executable data-flow composifions. To fhis 
end, we introduce here the main concepts that we use 
in this paper to support the automatic generation of 
composifions. This model is an exfension of a previous 
model used in pO) fo include QoS properties. 

Definition 1. A Composition Request R is defined as a tuple 
R = {In, Or}, where Ir is the set of provided inputs, and Or 
the set of expected outputs. Each input and output is related to 
a semantic concept from the set C of the concepts defined in an 
ontology Ont (Inyj,Outyj C C). Vie say that a composition 
satisfies the request R if it can be invoked with the inputs in 
Ir and returns the outputs in Or. 

Definition 2. A Semantic Web Service (hereafter "service”) 
can be defined as a tuple w = {Inw,Out.u],Qw} G W where 
luyj is a set of inputs required to invoke w, Outyj is the set of 
outputs returned by w after its execution, Qyj = {ql,,... 
is the set of QoS values associated to the service, and W is 
the set of all services available in the service registry. 

Each input and output is related to a semantic concept 
from the set C of the concepts defined in an ontology Ont 
(Inw,Outw C C). Each QoS value q}, G has a concrete 
type associated to a set of valid values Q. For example, the QoS 
values of a service w with two different measures, an average 
response time of 20 ms and an average throughput of 1000 in¬ 
vocations/second, is represented as Qw = {20ms, 1000 inv/s}, 
where 20ms G Qrt cmd 1000 inv/s G Qth- 


Semantic inputs and outputs are used to compose 
the functionality of multiple services by mafching their 
inputs and outputs together. In order to measure the 
quality of the match, we need a matchmaking mecha¬ 
nism that exploits the semantic I/O information of the 
services. The different matchmaking degrees that are 
contemplated are exact, plugin, subsumes and fail [ [M) . 

Definition 3. Given a,b G C, degree(a,b) returns the degree 
of match between both concepts (exact, plugin, subsume or 
fail), which is determined by the logical relationship of both 
concepts within the Ontology. 

Definition 4. Given a,b G C, match(a,b) holds if 
degree{a,b) fail. 

In order to determine which concepts are matched by 
other concepts, we define a matchmaking operator "O” 
that given two sets of concepts Ci,C 2 C C, it returns the 
concepts from C 2 matched by Ci. 

Definition 5. Given Ci,C 2 C C, we define "0 : Q x C — >• 
C" such that C 1 OC 2 = {c 2 G C 2 \match{ci, C 2 ) , ci G Ci}. 

We can use the previous operator to define the con¬ 
cepts of full and partial matching between concepts. 

Definition 6. Given Ci,C 2 C C, a full matching between 
Cl and C 2 exists if Ci®C 2 = C 2 , whereas a partial matching 
exists if C 1 OC 2 C C 2 . 

Definition 7. Given a set of concepts C C C, a service 
w = {Inyj,Outw} is invokable if C ® Iriy, = Iriy,, i.e., there 
is a full match between the provided set of concepts C and 
Iriw, so the information required by w is fully satisfied. 

This internal model used by the algorithm, which cap¬ 
tures the core components required to perform semantic 
matchmaking and composition of services, is agnostic to 
how semantic services are represented. Thus, the algo- 
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rithm is not bound to any concrete service description. 
Concretely, different service descriptions can be handled 
by the algorithm through the use of iServe imporfers 
for OWL-S, WSMO-life, SAWSDL or MicroWSMO. For 
furfher defails see [ |32| . 

4.2 Graph-Based Composition Modei 

In a nufshell, a dafa-flow composifion of services can be 
seen as a sef of services connecfed fogefher fhrough fheir 
inpufs and oufpuf, using fhe semanfic model defined 
before, in a way fhaf every service in fhe composifion 
is invocable and fhe invocation of each service in fhe 
composifion can transform a sef of inpufs info a sef 
of oufpufs. These concepfs can be nafurally capfured 
by graphs, where fhe vertices represenf inpufs, oufpufs 
and services, and fhe edges represenf semanfic mafches 
befween inpufs and oufpufs. Here we define fhe nofion 
of Service Match Graph and Service Composition Graph. 
The Service Match Graph is a graph fhaf capfures all the 
existent dependencies (matches) between all the relevant 
services for a composition request. The Service Composi¬ 
tion Graph is a particular case of the Service Match Graph 
that represents a composition contained in the Service 
Match Graph. 

The Service Match Graph represents the space of all 
possible valid solutions for a composition request R, and 
it is defined as a directed graph Gs = {V, E), where: 

• V = Wn U IU O U {S'o, Si} is the set of vertices of 
the graph, where Wr C FF is the set of relevant 
services, I is the set of inputs and O is the set 
of outputs. Si and So are two special services, 
called Source and Sink defined as So = 

Si = {Or,%}. 

• E = IW UWOU OI is the set of edges in the graph 
where: 

o IW Q {{iw,w) I fu, S J A w G FF} is the set 
of input edges, i.e., edges connecting input 
concepts to their services. 

o WO C {{w, Ou,) I ru G FF A Ou, G O} is the set 
of output edges, i.e., edges connecting services 
with their output concepts. 

O OI ^ }(.Owfwf I ^ (F U O) A 

match{ow,iw')} is the set of edges that repre¬ 
sent a semantic match between an output of 
w and an input of w'. 

There are also some restrictions in the edge set to 
ensure that each input/output belongs to a single service: 

• Vi G / = 1 A chcsf) = G FF (each 

input has only one outgoing edge which connects 
the input with its service) 

• Vo G 0,df,^{o) = 1 f\parcs{o) = {w},w G FF (each 
output has only one incoming edge which connects 
the output with its service) 

Function dQ^{v) returns the outdegree of a vertex v G Gs 
(number of children vertices connected to v), whereas 
dosiv) returns the indegree of a vertex v (number of 


parent vertices connected to v). The functions chc{v) and 
parciv) are the functions that returns the children ver¬ 
tices of V and the parent vertices of G Gs, respectively. 

Fig- a shows an example of a Service Match Graph 
where each service is associated with its average re¬ 
sponse time. As can be seen, this graph contains many 
different compositions since there are inputs in the graph 
that can be matched by the outputs of different services. 
For example, the parent nodes of the input ontl:Location 
of the service ML Service Predictor (parcipntV.Location)) in 
Fig. are ontl:GeoLocation and ontV.Place, so the input is 
matched by two outputs df,^ (ontELocation) = 2. 

A Service Composition Graph, denoted as Gc = {V, E), 
represents a solution for the composition request where 
each input is exactly matched by one output. Formally, 
it is a subgraph of Service Match Graph (Gq Gs) that 
satisfies the following conditions: 

• yi G I, df,^ (i) = 1 (each input is strictly matched 
by one output) 

• Gc is a Directed Acyclic Graph (DAG) 

These conditions are important in order to guarantee 
that a solution is valid, i.e, each input is matched by 
an output of a service and each service is invocable (all 
inputs on the composition are matched with no cyclic 
dependencies). This definition of service composition is 
language-agnostic, so the resulting DAG is a represen¬ 
tation of a solution for the composition problem which 
can be translated to a concrete language, such as OWL-S 
or BPEL. 

4.3 QoS Computation Model 

Before looking for optimal QoS service compositions, we 
need first to define a model to work with QoS over 
compositions of services which allow us to determine the 
best QoS that can be achieved for a given composition 
request on a service repository. When many services 
are chained together in a composition, the QoS of each 
individual service contributes to the global QoS of the 
composition. For example, suppose we want to measure 
the total response time of a simple composition with two 
services chained in sequence. The total response time is 
calculated as the sum of the response time of each service 
in the composition. However, if the composition has two 
services in parallel, the total time of the composition is 
given by the slowest services. Thus, the calculation of the 
QoS of a composition depends on the t 5 q)e of the QoS 
and on the structure of the composition. 

In order to define the common rules to operate with 
QoS values in composite services, many approaches use 
a QoS computation model based on workflow patterns 

S , which is adequate to measure the QoS of control- 
based compositions. However, this paper focuses on 
the automatic generation of optimal QoS-aware compo¬ 
sitions driven by the data-flow analysis of the service de¬ 
pendencies (input-output matches) that are represented 
as a Service Match Graph. 
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In this section we explain the general graph-centric 
QoS computation model that we use, based on the path 
algebra defined in |34|. This model is better suited to 
compute QoS values m a Service Match Graph, which, for 
exfension, is also applicable fo fhe particular case of fhe 
Service Composition Graph. 


Definition 8. (<5,0,0,^) is a QoS algebraic structure to 
operate with a set of QoS values, denoted as Q. This set is 
equipped with the following elements: 

• ® : Q Q ^ Q is a closed binary operation for 
aggregating QoS values 

• Q : Q X Q ^ Q is a binary operation for subtracting 
QoS values 

• :< is a total order relation on Q 


This algebraic structure has the following properties: 

1) Q is closed under 0 (any aggregation of fwo QoS 
values always refums a QoS value) 

2) The sef Q confains an idenfify elemenf e such fhaf 

ya G Q,a(B e = e(B a = a 

3) The sef Q confains a zero elemenf f such fhaf Va S 
Q,(/)0a = a0^ = ^ 

4) The operator 0 is associative 

5) The operafor 0 is monotone for ^ (preserves 
order). This implies fhaf Va, 6,cGQ,a0 6<t4- 
a 0 c ^ 5 0 c 

6) The operafor 0 is fhe inverse of 0: a 0 6 = c 

a = c 0 5 

Table shows an example of fhe concrefe elemenfs 
in fhis algebra. Note fhaf, for fhe sake of brevify only 
fhe response fime and fhroughpuf operators are repre- 
senfed in Table However, ofher QoS properties such 
as cost, availability, reputation, etc, can also be defined 
by insfanfiafing fhe corresponding operators. We denote 
Qrt the set of QoS values for response time (in mil¬ 
liseconds), Qth the set of QoS values for throughput 
(invocations/second). The total order comparator ^ is 
required to be able to order and compare different QoS 
values. Given two QoS values a,b G Q, a <b means that 
a is equal or better than b, whereas b < a means that a is 
equal or worse than b. The order depends on the concrete 
comparator defined on Q. For example, Qrt uses the 
comparator < to order the response time, so a, b G Qrt, 
a < b a < b. For example, given two response times 
10ms, 20ms G Qrt, 10ms ^ 20ms (10ms is better than 
20ms) since 10ms < 20ms. However, Qth uses the com¬ 
parator >, so a,b G Qth, a < b gg a > b. For example, 
given two throughput values 10 mu/s, 20 inv/s G Qth, 
20 inv/s -< 10 inv/s (20 inv/s is better than 10 inv/s) since 
20 inv/s > 10 inv/s. This order relation also affects the 
behavior of the min and max functions. The min function 
always selects the best QoS value, whereas the max 
function always selects the worst QoS value. 

Definition 9. Fq{w) : W ^ Q is a function that given a 
service w G W, it returns its corresponding QoS value from 
Qu, with type Q. This function can be seen as a function to 
measure the QoS of a service. 


TABLE 1 

QoS algebra elements for response time and throughput 


QoS (Q) 

a © 6 

a © 6 

e 


Order (^) 

Qrt — U {^} 

a + b 

a - b 

0 

00 

< 

Qth — I^>o U {00} 

min(a, b) 

min(a,b) 

00 

0 

> 


For example, in Fig. FQ^.^{Trans. Service) = 130ms. 

Definition 10. Vq{w) : W ^ Q is a function that given a 
service w, it returns its aggregated QoS value. This is defined 
as: 


Vq{w) 


max {V^{i)) 0 Fq{w) if In,„ ^ 0 
Fq{w) ifln^=tl) 


( 1 ) 


Informally, this function calculates the aggregated QoS 
of a service by taking the worst value of the QoS of its 
inputs plus the current QoS value of the service itself. 
Taking for example the service Premium Geoloc Service 
from Fig.l^ Vqj^^, (Premium Geoloc Service) is computed as 
max{VQ)^ont3:IP_Address), (onf4;ChenffD))04Oms, 
which is max(0ms, 20ms) 0 40ms = 60ms (see Def. [T^ . 

Definition 11. Vq^^(ow) : O ^ Q is a function that given 
an output of a service w, G O, it returns its aggregated 
QoS value. The aggregated QoS of an output is equal to the 
aggregated QoS of a service. Thus, it is defined as: 

= Vq(w) (2) 

For example, the aggregated QoS of the out¬ 
put ontl-.Place {VQ'^^(ontl:Place)) is equal to the ag¬ 
gregated QoS of its service Premium Geoloc Service 
(Premium Geoloc Service)), which is equal to 60ms. 

Definition 12. VQ^(iyj) : I ^ Q is a function that given an 
input of a service w, G I, it returns its optimal aggregated 
QoS value. This function is defined as: 



Vq“‘(ou,/),Ou,' G parc(iw) 

min (V^'^^o,,,)) 
^yo^/GparG{^w) 


if df,^(iyj) = 0 
ifdG^(iw) = l (3) 

if do^(iw ) > 1 


Given an input G Iriw of a service w, this function 
returns the accumulated QoS for that input. If the eval¬ 
uated input is not matched by any output (df,^ (i^) = 0), 
then the accumulated QoS of the input is undefined. 
If the evaluated input is matched by just one output 
{dG^(iw) = 1), then its accumulated QoS value is equal 
to the accumulated QoS of that output. If the evalu¬ 
ated input can be matched by more than one output 
(^Gg (iw) > 1)/ i-O-/ there are many services that can match 
that input, then its accumulated QoS value is computed 
by selecting the optimal (best) QoS. 

For example, the optimal aggregated QoS 

of the input ont3:Payment from Transaction 
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Service {Vf^^^{ont3:Payment)) is calculated as 
imii{VQ^^{ont3:PaymentID),VQ^^{ont5:PayInfo)) = 70ms. 

Definition 13. We define VQ{g) : G ^ Q as a function 
that given a Service Match Graph g = {V, E), it returns its 
optimal aggregated QoS value. This is defined as: 

V§{g) = VQiSi),S,€V (4) 

Basically, the optimal QoS of a Service Match Graph 
Gs corresponds with the optimal aggregated QoS of its 
service Si G Gg. 

4.4 Composition Probiem 

Given a composition request R = {I^, Oi?}, a set of 
semantic services W, a semantic model and a QoS al¬ 
gebra, the composition problem considered in this paper 
consists of generating the Service Match Graph Gs and 
selecting a composition graph Gc C Gs such that: 

1) yG'QjVQ^Gc) < i.e., the composition 

graph has the best possible QoS 

2) Wr C V, \ Wr\ is minimized (the composition 
graph contains the minimum number of services) 


1: function SERVICEMATCHGRAPH(i? = |Jr, Or}, W) 
2: G := Ir- W' := W; Wr := {So, Si} 

3: unmatchedin := [ ]; availGon := Ir 

4: repeat 

5: fhse/ected — 0 

6: Wrei ■= (w G W' | availGon 0 In,,, 0} 

7: Wrel := Wrel \ Wr 

8: for all Wi = {/rim., Outu,-} € Wrel do 

9: Uset '■= unmatchedln[wi] 

10: Mset := G (g) Uset 

11: unmatched I n[wi\ := Uset \ Mset 

12 : if Mset = 0 then 

13: ^Uselected — selected U Wt 

14: availGon := availGon U Outw, 

15 : W' := W' \ Wseleeted 

16: Wr := Wr U Wseleeted 

17: G := G U availGon 

18: availGon 0 

19: until Wseleeted = 0 

20: return CQMPUTE-GRAPH(thu) 

Fig. 2. Algorithm for generatig a Service Match Graph 
from a composition request R and a set of services W. 


5 Composition Algorithm 

Qn the basis of the formal definition of the automatic 
QoS-aware composition problem, in this section we 
present our hybrid approach strategy for automatic, 
large-scale composition of services with optimal QoS, 
minimizing the services involved in the composition. The 
approach works as follows: given a request, a directed 
graph with the relevant services for the request is gener¬ 
ated. Once the graph is built, an optimal label-correcting 
forward search is performed in pol 5 momial time in order 
to compute the global optimal QoS. This information is 
used later in a multi-step pruning phase to remove sub- 
optimal services. Finally, a hybrid local /global search is 
performed within a fixed time limit to extract the optimal 
solution from the graph. The local search returns a near- 
optimal solution fast whereas the global search performs 
an incremental search to extract the composition with 
the minimum number of services in the remaining time. 
In this section we explain each step of the algorithm, 
namely: 1) generation of the Service Match Graph; 2) 
calculation of the optimal end-to-end QoS; 3) multi-step 
graph optimizations and 4) hybrid algorithm. 

5.1 Generation of the Service Match Graph 

Given a composition request, which specifies the inputs 
provided by the user as well as the outputs it expects 
to obtain, and a set of available services, the first step 
consists of locating all the relevant services that can 
be part of the final composition, as well as computing 
all possible matches between their inputs and outp uts, 
according to the semantic model presented in Sec. 4.1[ 
The output of this step is a Service Match Graph that 


contains many possible valid compositions for the re¬ 
quest, as the one represented in Fig. In a nutshell, 
the generation of the graph is calculated by selecting 
all invocable services layer by layer, starting with S'o in 
the first layer (the source service whose outputs are the 
inputs of the request) and terminating with Si in the last 
layer (the sink service whose inputs are the outputs of 
the request) p5) . 

The pseudocode of the algorithm is shown in Fig. 
|2 The algorithm runs in pol 5 rnomial time, selecting 
Wseleeted U W serviccs at each step. At each layer, the 
algorithm finds a potential set of relevant services whose 
inputs are matched by some outputs generated in the 
previous layer using tbe ® operator (L|^. Then, for each 
potential eligible service, the algorithm checks whether 
the service is invokable or not (i.e., all its inputs are 
matched by outputs of previous layers) by checking if 
all the unmatched inputs of the service are matches. 
All the inputs that are matched are removed from the 
unmatched set of inputs for the current service (L|Tl|. 
If the service is invokable (has no unmatched inputs)7it 
is selected and its outputs are added to the set of the 
available concepts. In case the service still has some un¬ 
matched inputs, these inputs are stored in a map to check 
it again in the next layer. For example, the first eligible 
services for the request shown in Fig. |3 are the services in 
the layer LI, which correspond with me services whose 
inputs are fully matched by Ir (the set of output concepts 
produced in LO). The second eligible services are those 
services (placed in L2) whose inputs are fully matched 
by the outputs of the previous layers, and so on. The 
algorithm stops when no more services are added to 
the set of selected services. Finally, COMPUTE-GRAPH 
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LO LI LZ L3 L4 L5 



Fig. 3. Graph example with the solution with optimal QoS 
and minimum number of services highlighted. 


computes all possible matches between the outputs and 
the inputs of the selected services. The output of fhis 
process is a complefe Service Match Graph fhaf can confain 
cycles, as fhe one depicfed in Fig. 

5.2 Optimal end-to-end QoS 

Once the Service Match Graph is computed for a compo¬ 
sition request, the next step is to calculate the best end- 
to-end QoS achievable in the Service Match Graph. The 
optimal end-to-end QoS can be computed in polynomial 
time using a shortest path algorithm to calculate the best 
aggregated QoS values for each input and output of the 
graph, i.e., the best QoS values at which the outputs can 
he generated and the inputs are matched. In order to 
compute the optimal QoS, we use a generalized Dijkstra- 
based label-setting algorithm computed forwards from 
So to Si p^ , based on the algebraic model of fhe QoS 
presented m Sec. The optimality of the algorithm is 
guaranteed as long as the function defined to aggregate 
the QoS values (0) is monotonic, in order to satisfy the 
principle of optimalify. A proof can be found in [ [3^ . 

Fig. 0 shows fhe pseudocode of fhe generalized 
Dijkstramased label-seffing algorithm. The algorithm 
starts assigning infinite QoS cost to each input in the 
graph in the table qos. An infinite cost for an inpuf means 
fhaf fhe inpuf is sfill nof resolved. The firsf service to 
be processed is So. Each time a service w is processed 
from the queue, the best accumulated QoS cost of each 
input iw' matched by the outputs of fhe service w is 
recalculafed. If there is an improvement (i.e., a match 
with a better QoS is discovered) the affected service is 
stored in updated to recompute its new aggregated QoS. 
Finally, for each service w G updated, we recompufe 
its aggregated QoS using the updated values of each 
affecfed input. If the QoS has been improved, the service 
is added to the queue to expand it later. 

5.3 Graph optimizations 

Finding the composition with the minimum number of 
services is a very hard combinatorial problem which. 


1: function QoS-Update(Gs = {V,E}) 

2 : /*qos is a table indexed by inputs (i) 

3: associated to their aggregated QoS (q)*/ 

4: ( 70 s[h( 7 ]^[] 

5: for all i G I,I CV do 

6 : qos[i] G- (p 

7: queue So 

8 : while queue 7 ^ 0 do 

9: /* Queue sorted by aggregated QoS */ 

10 : w G- PQP(gueMe) 

11 : updated = {} 

12: for all Oui G Outyj do 

13: for all Zu,' € cha(,Ou,) do 

14: if Vq{w) -< qos[iw'] then 

15: qos[iu,'] G- Vq{w) 

16: updated G- updated U w' 

17: for all w G updated do 

18: if cosf w has been improved then 

19: queue ^INSERT{w, queue) 

20 : return qos 

Fig. 4. Dijkstra-based algorithm to compute the best QoS 
for each input and output in the Service Match Graph Gs- 


in most cases, has a very large search space, mainly 
determined by the size of fhe Service Match Graph. In 
order fo improve the scalability with the number of 
services, we apply a sef of admissible optimizafions fo 
reduce fhe search space. Af each pass, the algorithm 
analyzes different criteria to identify services fhaf are 
redundanf or can be substituted by better ones, so the 
size of fhe graph decreases monotonically. The different 
passes that are sequentially applied are: 1) elimination 
of services thaf do nof confribufe fo the outputs of the 
request; 2) pruning of services thaf lead to suboptimal 
QoS; 3) combination of interface (inputs/outputs) and 
QoS equivalent services; and 4) replacement of interface 
and QoS dominafed services. These opfimizafions are 
an exfension of fhe opfimizafions presenfed in pO] fo 
supporf QoS. 

The first pass selects the set of reachable services in 
fhe Service Match Graph. Sfarfing from the inputs of Si, if 
selecfs all fhose services whose outputs match any inputs 
of Si. This step is repeated with the new services until 
the empty set is selected. Those services that were not 
selected do not contribute to the expected outputs of the 
composition and can be safely removed from the graph. 

The second pass prunes the services of the graph that 
are suboptimal in terms of QoS, i.e., they carmot be part 
of any optimal QoS composition. To do so, we compute 
the maximum admissible QoS bound for each input in 
the graph. In a nutshell, the maximum bound of the 
inputs of a service w can be calculafed by selecfing the 
maximum QoS bound among the bounds of all inputs 
matched by the outputs of the service w and subtracting 
the QoS of w. This can be recursively defined as: 
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maxQ{iw) = 

Vq{w) 0 Fq{w) if Outw = 0 

max (maxnfiw')) © Fq(w) if Outw ^ 0 

The value of maxg for each inpuf in fhe graph can 
be easily calculafed by prop^afing fhe bounds from Si 
fo So. For example, in Fig. Id we sfarf computing fhe 
maximum bound of fhe inpufs of Si {xsd:boolean). Since 
Si has no oufpufs, maxQ{xsd:boolean) is calculafed as 
Vq(S'z) Q FQ{Si) = 410 ms — 0 ms. Then, we selecf all 
fhe services whose oufpufs mafch xsd:boolean. In fhis case 
fhere is jusf one service, ML Predictor Service. The bounds 
of ifs inpufs are now compufed by subtracfing ouf fhe 
Fq{ML Predictor Service) from fhe maximum bound of 
fhe inpufs fhaf fhis service mafches. Since fhere is jusf 
one inpuf mafched {xsdiboolean from Si) whose bound is 
410 ms, we have maxgii) = 410tos — 210ms = 200ms for 
each inpuf i of fhe service. In fhe nexf sfep, we have fhree 
services fhaf mafch fhe new calculafed inpufs {Free Geoloc 
Service, Premium Geoloc Service and Transaction Service). 
The maximum bounds of fhe inpufs of fhese services are 
200ms — 180ms = 20ms, 200ms — 40ms = 160ms and 
200ms — 130ms = 70ms respectively Nofe fhaf, since 
fhe maximum bound of Transaction Service is 70ms, the 
service Secure Payment is ouf of fhe bounds (ifs oufpuf 
QoS is 80 ms), so if can be safely pruned. 

The third and the forth pass analyze service equiva¬ 
lences and dominances in the Service Match Graph. It is 
very frequent to find services from differenf providers 
fhat offer similar services wifh overlapping inferfaces 
(inpufs/oufpufs). In scenarios like fhis, if is easy fo end 
up wifh large Service Match Graph fhaf make very hard 
fo find optimal compositions in reasonable time. One 
way fo reduce fhe complexify wifhouf losing informafion 
is fo analyze fhe inferface equivalence and dominance 
befween services in order fo combine fhose fhaf are equiv- 
alenf, or replace fhose fhaf are dominafed in ferms of fhe 
inferface fhey provide and fhe QoS fhey offer. In a nuf- 
shell, we check fhree objectives fo compare services: fhe 
amount of informafion fhey need fo be invoked (inpufs), 
fhe amount of informafion fhey refurn (oufpufs), and 
fheir QoS. If a sef of services are equal in all objectives, 
fhey are equivalenf and fhey can be combined info an 
absfracf service wifh several possible implemenfafions. 
If a service is equal in all objectives and af leasf better in 
one objecfive (if requires less information to be invoked, 
produces more information or has a better QoS), then 
the service dominates the other service. A more detailed 
description of fhe inferface and dominance opfimizafions 
is described in p0| . 

Nofe fhaf opfimizafions are applied right before all 
semantic mafches are compufed in fhe Service Match 
Graph, since fhe opfimizafions are based on fhe analysis 
of fhe I/O mafches among services. For fhis reason, fhey 
carmof be applied during fhe calculafion of fhe graph 


(this would require to precompute in advance missing 
relations during the graph generation, which does not 
provide any benefit as this is what the Service Match 
Graph generation algorithm already does). Qn the other 
hand, optimizations are applied sequentially to save 
computation time, since the number of services in fhe 
graph decreases monofonically in each sfep. In order fo 
fake advanfage of fhis, fasfer opfimizafions are applied 
firsf so fhaf fhe slower opfimizafions in fhe pipeline can 
work wifh a reduced sef of services. 


5.4 Hybrid algorithm 

Each service in fhe composition graph may have dif¬ 
ferent services that match each input, thus there may 
exist multiple combinations of services fhaf satisfy fhe 
composition requesf wifh fhe same or different QoS. The 
goal of fhe hybrid search is fo exfracf good solufions from 
fhe composition graph, opfimizing fhe fofal number of 
involved services in fhe composition and guaranfeeing 
fhe opfimal QoS. Thus, for each inpuf we selecf jusf one 
service of fhe graph fo mafch fhaf inpuf, until the best 
combination is found. The hybrid search performs a local 
search fo exfracf a good solufion and in fhe remaining 
time, if fries fo improve fhe solufion by rurming a global 
search. 

Fig. 1^ shows fhe pseudocode of fhe local search 
sfrafegy. The algorithm starts with a composition graph, 
the inputs of fhe service Si marked as unresolved (fhe 
expecfed oufpufs of fhe requesf) and fhe service Si 
selecfed fo be parf of fhe solufion. An unresolved input is 
an inpuf fhaf can be mafched by many differenf oufpufs 
buf no decision has been made yef. Using fhe lisf of fhe 
unresolved inpufs fo be mafched, fhe mefhod RANK- 
RESQLVERS refurns a lisf of services fhaf mafch any 
of fhe unresolved inpufs. Services are ranked according 
fo fhe number of unresolved inpufs fhaf mafch, so fhe 
service fhaf mafches more inputs is considered first to 
be part of the solution. Then, for each input that the 
selected service can match, the method CYCLE performs 
a forward search to check if resolving the selected input 
with that service leads to a cycle. Eor example, in Fig. 
1^ if we select the service K to match the input of / 
after having decided to resolve the input of K with the 
service I, we end up with an invalid composition, so 
K is an invalid resolver for / and it must be discarded. 
Once all resolvable inputs are collected in resolved, the 
method RESQLVE creates a copy of the current graph 
where the inputs in unresolved are matched only by the 
selected service, i.e., any other match between any output 
from a different service to that input is removed from the 
graph. If the selected service was not already selected, 
then all its inputs are then marked as unresolved and 
a recursive call to LSBT is performed to select a new 
service to resolve the remaining inputs, until a solution 
is found. If a dead end is reached (a solution that has no 
services to resolve the remaining inputs without cycles) 
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function LOCAL-SEARCH(Gs = {R, E)) 
return LSBT(G5, Insi, {-Si}) 

function LSBT(Gs, unresolved, services) 
if unresolved = 0 then return Gs 
servs -1— RANK-RESOLVERS(unresoZ?;ed) 

for each w € servs do 
resolved ^ {} 

matched ^ Outw 0 unresolved 
for each input s matched do 
if ^CYCldE{Gs,w,input) then 
resolved resolved U input 
if resolved ^ 0 then 

unresolved •<— unresolved \ resolved 
ii w ^ services then 

unresolved ^ unresolved U In^ 

Gg ^ RES 0 LVE(G 5 ', w, resolved) 
services services U w 
result •<— LSBT(Gg, unresolved, services) 
if result ^ fail then return result 
return fail 


Fig. 5. Local search algorithm to extract a composition 
from a graph. 


the algorithm backtracks to a previous state to try a 
different service (L0. 

An implementation of the CYCLE method is provided 
in 1^ The algorithm performs a look-ahead check in a 
breadth-first fashion to determine whether matching the 
selected input i with an output of the service w leads to 
a cyclic dependency. This is done by traversing only the 
resolved matches, i.e., inputs that are matched by just 
one output of a service, until the selected service w is 
reached, proving the existence of a cycle. A more mem¬ 
ory efficient implementation of the cycle algorithm can 
be done using the Tarjan's strongly connected components 
algorithm p8| , stopping at the first strongly connected 
component oetected. 

After the local search is used to find a good solution, 
the global search is performed in the remaining time 
to obtain a better solution by exhaustively exploring the 
space of possible solutions. In a nutshell, this algorithm 
works as follows: Given a Service Match Graph Gs, with 
some unresolved inputs, which initially are the inputs 
of the service Si, tbe algorithm selects an input to be 
resolved and for each service candidate that can be used 
to resolve that input, it generates a copy of the graph Gs 
but with the input resolved (i.e., the selected service is 
the only one that matches the unresolved input). The 
algorithm enqueues each new graph to be expanded 
again, and repeats the process by extracting the graph 
with the minimum number of services from the queue, 
until it eventually finds a graph with no unresolved 
inputs. 

Fig-0 shows the pseudocode of the global search 


function CYCLE(Gs = {V,E},wfwi) 

LVyisited ^ {"IT: } 

LV^ew ^ } 

while Wnew 7^ 0 do 

^Yreached t {} 

for all Wn G Wnew do 
for all Own G Outw„ do 

for all iw'^ G chosiow„) do 

if df^ (iw^) = lAw'n ^ Wyisited then 
if w'n = w then return true 

W^neached fYyeached 0 
fYnew t LYreached 
fYvisited ^ fYvisited L fVnew 

return false 


Fig. 6. Naive breadth-first-search algorithm to check 
whether using the service w to resolve the input w of 
a service w' leads to a cycle. 


algorithm. The algorithm starts computing the optimal 
QoS of the graph with the method QoS-UPDATE. This 
method returns a key-value table qos[i, g] where each 
key corresponds with an input i of the graph, and each 
value q its optimal aggregated QoS q = V^{i). Then, 
the inputs of the service Si of the gr^h are added 
to lun to mark them as unresolved (L|8l. In order to 
minimize the number of possible candidates for each 
unresolved input, we compute and propagate a range 
of valid QoS values, called QoS bounds, and defined 
as an interval [min, max]. These bounds determine the 
range of valid accumulated QoS values of the outputs 
that can be used to match each of the unresolved inputs 
without exceeding the optimal end-to-end QoS of the 
final composition. The min value is the optimal QoS 
for the input, i.e., there is no output in the graph that 
can match the input with a lower QoS, whereas the 
max value is the maximum QoS value supported. If 
this bound is exceeded, the total aggregated QoS of 
the composition worsens. For example, in Fig. the 
bounds of the input ont4:ClientID of the service Premium 
Geoloc Service are [20ms, 160tos]. If we exceed the min 
bound (20 ms), the output QoS of the service gets worse 
(> 60ms), which also affects the optimal QoS of the input 
ontl-.Location. However, as long as the max bound is not 
exceeded (< 160ms), the optimal accumulated QoS of the 
ML Predictor Service would not be affected. 

The method COMPUTE-Vq is used to compute the 
value of the Vq function (Eq. using the best QoS 
values of inputs, stored in qos (gos^] = Vq"(i)). A 
tuple (Gs, Inn, QOS, Wgei), where Gs is the current graph. 
Inn are the unresolved inputs of Gs, qos is the best 
aggregated QoS values for each input in Gs and Wgei is 
the set of the selected services, defines the components 
of a partial solution. Each partial solution is stored in 
a priority queue, which is sorted by the number of 
services Wgei. This allows an explorafion of the search 
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space in a breadth-first fashion, so the solution with the 
minimum number of services is always expanded first. 
At each iteration, a partial solution is extracted from 
the queue to be refined (L|T^. If the partial solution 
has no unresolved inputs, the solution is complete, and 
has the minimum number of services. If the partial 
solution still has some unresolved inputs, it is refined by 
selecting an unresolved input with the method SELECT. 
This method selects the input to be resolved, using a 
minimum-remaining-values heuristic. This heuristic selects 
always the input with less resolvers (services candidates) 
in order to minimize the branching factor. The list of 
services that can match the selected input with a total 
aggregated QoS value within the [min, max] bound is 
calculated with the method RESOLVERS. For each valid 
service, the algorithm performs a look-ahead search to 
check whether using the current service to resolve the 
selected input leads to an unavoidable cycle. If so, the 
service is prematurely discarded to save computation 
time and space. If it does not lead to a cycle, then a 
copy of the graph (G'g) with the selected input resolved 
is generated, and the input is also removed from the 
set of unresolved inputs. Using the optimal aggregated 
QoS values for the inputs of the graph, stored in qos, 
the algorithm computes the aggregated QoS value of the 
service w. If this value is worse than the min bound 
(CQMPUTE-Vq( w, gos') min), then the aggregated 
QoS value of some inputs and outputs of the graph may 
be affected. Thus, a repropagation of the QoS values for 
each input and output is computed again over the new 
graph G'g For example, if the Business Service Info 

increments its response time to 40 ms, a repropagation 
is required to recompute the accumulated QoS of all the 
services that may be affected. In this case, the Premium 
Geoloc Service increments its accumulated QoS cost from 
60 ms to 80 ms, as well as the optimal QoS of the 
ontl:Location. 

Finally, if the current service is not part of the current 
solution, its inputs are added to the unresolved table, and 
a new bound for each input is computed. The min bound 
corresponds with the optimal value, which is stored in 
qos'. In order to compute the max bound, we need to 
subtract the QoS of the selected service {Fq{w)) from the 
max bound of the resolved input, using the operator © 
(L 251. This new partial solution is inserted in the queue 
to oe expanded later on. 


6 Evaluation 

In order to evaluate the performance of the proposed 
approach, we conducted two different experiments. In 
the first experiment, we evaluated the approach u sing 
the datasets of the Web Service Challenge 2009-2010 
The goal of this first experiment was to evaluate the 
peformance and scalability of the proposed approach on 
large-scale service repositories. In the second experiment, 
we tested the algorithm with five random datasets in 
order to better analyze the differences of the performance 


function GLQBAL-SEARCH(Gs) 
qos[i,q] ^ QoS-UPDATE(Gs) 
max ^CQMPUTE-Vq (S'i, gos) 

FFse/ {Si} 

/* lun is a key-value table where the keys are 
unresolved inputs and the values their QoS bounds */ 
for isi G Insi do 


8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 
29 


Lun [tsi] ^ [qos[isi],max] 

/* Queue sorted by \Wsei\*/ 

queue •(— INSERT((Gs, lum Qos, Wgei),queue) 

while queue 7 ^ 0 do 

{Gs,Iun,qos,Wsei} ^ POP{queue) 
if lun — 0 then return G 5 
input ^ SELECT(/„ji) 

[min, max] ^ Iun[input] 
for all w G RESQLVERS(mpMt, [min, max]) do 
if -^CYC'LE{Gs,w,input) then 
G's ^ RESQLVE(Gs,M;,{mput}) 

I'un ^ REMQVE(i,4„) 
qos' qos 

if CQMPUTE-VQ(?ii, gos') © min then 
qos' ^ QoS-UPDATE(G's) 

ii w 4. Wsei then 

max' ^ max 0 Fq{w) 
for iui G In W do 
min' <— qos'[iu,] 

I'unli-w] ^ [min',max'] 

queue ^ INSERT((Gg, qos', , queue) 

return fail 


Fig. 7. Global search algorithm to extract the optimal 
composition. 


between the local and the global search. All tests were 
executed with a time limit of 5 min. Solutions produced 
by our algorithm are represented as Service Composition 
Graphs (no BPEL was generated). 

6.1 Web Service Challenge 2009-2010 datasets 

The datasets of the Web Service Challenge 2009-2010 
range from 572 to 15,211 services with two different 
QoS properties: response time and throughput. Table 
| 2 ] shows the results obtained for each dataset and for 
each QoS property. The response time is the average 
time (measured in milliseconds) that a service takes to 
respond to a request. The throughput, as defined in 
the WSC, is the average ratio of invocations per second 
supported by a service. 

Row #Graph services shows the number of services of 
the composition graph and #Graph services (opt) the num¬ 
ber of services after applying the graph optimizations. 
As can be seen, the optimizations reduce, on average, 
by 64% the number of services in the initial composition 
graph. This indicates that equivalence and dominance 
analysis of the QoS and the functionality of services is a 
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TABLE 2 

Validation with the WSC 2009-2010 


TABLE 3 

Comparison with the top 3 WSC 2010 



D-01 

D-02 

D-03 

D-04 

D-05 

#Services in the dataset 

572 

4,129 

8,138 

8,301 

15,211 

Validation with Response Time 

Optimal Response Time (ms) 

500 

1,690 

760 

1,470 

4,070 

#Graph services 

81 

141 

154 

331 

238 

#Graph services (opt) 

21 

57 

15 

160 

126 

Local Search 

#Services 

5 

20 

10 

40 

32 

Time (s) 

0.613 

0.988 

2.608 

7.767 

2.920 

Global Search 

#Services 

5 

20 

10 

- 

32 

Time (s) 

0.617 

1.580 

2.613 

- 

24.971 

Validation with Throughput 

Optimal Throughput (inv/s) 

15,000 

6,000 

4,000 

4,000 

4,000 

#Graph services 

81 

141 

154 

331 

238 

#Graph services (opt) 

10 

43 

90 

156 

69 

Local Search 

#Services 

5 

20 

15 

62 

31 

Time (s) 

0.343 

1.173 

1.933 

8.571 

2.562 

Global Search 

#Services 

5 

20 

10 

- 

30 

Time (s) 

0.345 

1.246 

2.085 

- 

119.322 


powerful technique to reduce the search space in large 
scale problems. Rows Local search and Global search show 
the number of services of the solution obtained with 
each respective method as well as the total amount of 
time spent in the search. The global search found the 
best solution for each dataset and for each QoS property, 
except for the dataset 04, where the composition with the 
minimum number of services could not be found due 
to combinatorial explosion. However, in those cases, the 
local search strategy is able to find an alternative solution 
very fast. Note also that, in many cases, the local search 
obtains the best solution (comparing it with the global 
search) except for the throughput in datasets 03 and 05. 

We have compared our approach with the top-3 of 
the Web Service Challenge 2010 [ |40| . Table shows this 
comparison following the same rormat and the same 
rules of the Web Service Challenge. The format, rules 
and other details of the challenge are described in ||4^. 
Third and forth columns show the response time and the 
throughput obtained for each dataset. Note that, since all 
these algorithms minimize a single QoS, these values are 
computed by executing the algorithm twice, one for each 
QoS. Unfortunately, the results provided by the WSC 
organization in HOj show only the minimum number 
of services for Both executions (fifth column). Thus, 
the number of services obtained for both the response 
time and throughput is unknown, which makes it hard 
to compare with our results. Even so, using the same 
evaluation criteria, our approach obtains the optimal 
QoS for the response time and the throughput, and also 
improves the number of services in D-04 (40 vs 73) and 
D-05 (30 vs 32) with respect to the solutions obtained 
by the winner of the challenge (the minimum number 
of services obtained for each dataset is highlighted). 
The last column shows the total execution time of each 
algorithm. The total time includes the time spent to 



R.Time 

Through. 

Min. Serv. 

Time (ms) 

D-01 

casIM 

500 

15,000 

5 

78 

RUG]6| 

500 

15,000 

10 

188 

Tsinghua js 

500 

15,000 

9 

109 

Our approach 

500 

15,000 

5 

956 

D-02 

cAsia 

1,690 

6,000 

20 

94 

RUG 1^ 

1,690 

6,000 

40 

234 

Tsinghua 5 

1,690 

6,000 

36 

140 

Our approach 

1,690 

6,000 

20 

2,171 

D-03 

CAS a 

760 

4,000 

10 

78 

rug(^ 

760 

4,000 

11 

234 

Tsinghua j5j 

760 

4,000 

18 

125 

Our approach 

760 

4,000 

10 

4,693 

D-04 

CAS a 

1,470 

4,000 

73 

156 

RUG 1^ 

1470 

4,000 

133 

390 

Tsinghua 5 

1,470 

4,000 

133 

188 

Our approach 

1,470 

4,000 

40 

16,338 

D-05 

CAS a 

4,070 

4,000 

32 

63 

RUG]6| 

4,070 

4,000 

4,772 

907 

Tsinghua js 

4,070 

4,000 

4,772 

531 

Our approach 

4,070 

4,000 

30 

122,242 


obtain the solution for the response time and for the 
throughput. 

Qur approach takes, in general, more time to obtain 
a solution. However, it should be noted that we show 
the best results achieved by the hybrid approach, i.e., 
if the global search improves the solution of the local 
search, we show that solution along with the time taken 
by the global search. Anyway, the local search always 
provide a first good solution very fast. For example, as 
can be seen in Table the optimal solution for D-05 has 
30 services and has Been obtained in 119.322 s, but the 
local search obtained a solution with 31 services in 2.56 
s, still better than the solution with 32 services obtained 
by ID (Table §. Moreover, it should also be noted that 
the problem of finding the optimal composition with 
minimum number of services and optimal QoS is much 
harder than just optimizing the QoS objective function, 
which is the problem solved by the participants of the 
WSC 2010. Although the problem is intractable and 
requires exponential time, it can be optimally solved for 
many particular instances in a reasonable amount of time 
using adequate optimizations even in large datasets as 
shown in Tables |2] and This is one of the main reasons 
why a combination of a local and global search can 
achieve good results in a wide variety of situations, in 
contrast with pure greedy strategies or with pure global 
optimization algorithms. 

We also compare the results obtained with Chen et 
al. 10, who offer a detailed analysis of their results. 
This comparison is shown in Table Solutions are 
compared according to their QoS and number of services. 
A solution is better if 1) its overall QoS is better or 2) has 
the same QoS but less services. The results show that our 
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algorithm always gets same or better results. Concretely, 
it finds solutions with optimal QoS and less services in 
D-01, D-02, D-04 and D-05 (response time), and D-03 
(throughput). It also finds a solufion wifh a beffer QoS 
(4000 inv/s vs 2000 inv/s) in D-04 (fhroughpuf). 

TABLE 4 

Detailed comparison with 0 



D-01 

D-02 

D-03 

D-04 

D-05 

Chen et al. 

R. Time 

500 

1,690 

760 

1,470 

4,070 

Services 

8 

21 

10 

42 

33 

Our approach 

R. Time 

500 

1,690 

760 

1,470 

4,070 

Services 

5 

20 

10 

40 

32 

Chen et al. 

Throughput 

15,000 

6,000 

4,000 

2,000 

4,000 

Services 

5 

20 

21 

40 

30 

Our approach 

Throughput 

15,000 

6,000 

4,000 

4,000 

4,000 

Services 

5 

20 

10 

62 

30 


6.2 Randomly generated datasets 

Although the global search is able to obtain solutions 
with a lower number of services, a first look at the 
results with the WSC dataset might suggest that the 
difference of both strategies is not very significant, as 
most of the obtained solutions have the same number 
of services. However, this may be due to a bias in 
the repository, since all the datasets of the WSC are 
generated using the same random model. In order to 
better evaluate and characterize the performance of the 
hybrid algorithm, we generated a new set of five random 
datasets that range from 1,000 to 9,000 services. These 
datasets are available at https://wiki.citius.usc.es/inv: 
downloadable_results:ws-random-qos lable|^shows the 
solutions obtained. 

TABLE 5 

Validation with random datasets 



R-01 

R-02 

R-03 

R-04 

R-05 

#Services in the dataset 

1,000 

3,000 

5,000 

7,000 

9,000 

Validation with Response Time 

Optimal Response Time (ms) 

1,430 

975 

805 

1,225 

1,420 

#Graph Services 

54 

168 

285 

383 

499 

#Graph Services (opt) 

22 

50 

54 

56 

99 

Local Search 

#Services 

7 

18 

20 

15 

19 

Time (s) 

0.183 

0.403 

0.422 

0.515 

0.641 

Global Search 

#Services 

7 

14 

15 

15 

16 

Time (s) 

0.243 

0.767 

4.088 

0.740 

3.131 

Validation with Throughput 

Optimal Throughput (inv/s) 

1,000 

2,500 

1,500 

2,000 

2,500 

#Graph Services 

54 

168 

285 

383 

499 

#Graph Services (opt) 

19 

46 

133 

116 

103 

Local Search 

#Services 

7 

17 

24 

19 

23 

Time (s) 

0.072 

0.143 

0.606 

0.732 

0.450 

Global Search 

#Services 

7 

12 

12 

15 

16 

Time (s) 

0.155 

0.310 

2.479 

1.485 

1.714 


We found that in these datasets, the solutions ob¬ 
tained with the global search strategy are, on average. 


« 16% smaller than the ones obtained with the local 
search, whereas the differences in seach time are less 
pronounced than in the previous experiment. These 
findings suggest that the performance of each strategy 
highly depends on the underlying structure of the service 
repository, which is mostly determined by the number of 
services and the existing matching relations. 

In order to test whether these differences are statis¬ 
tically significant or not, we conducted a nonparamet- 
ric test using the binomial sign test for two dependent 
samples with a total of 20 datasets (5 WSC w/ response 
time + 5 WSC w/throughput + 5 Random w/response 
time + 5 Random w/throughput). The null h 5 rpothesis 
was rejected with p-value « 0.01 | |4T] , meaning that 
both strategies (local and global search) find significantly 
different solutions. Thus, a hybrid strategy can perform 
better in many different scenarios, since it achieves a 
good tradeoff between quality and execution time. 

This evaluation shows that, on one hand, the combi¬ 
nation of local and global optimization is a general and 
powerful technique to extract optimal compositions in 
diverse scenarios, as it brings the best of both worlds. 
This is specially important when only a little or nothing 
is known concerning the structure of the underlying 
repository of services. Qn the other hand, the results 
obtained with the Web Service Challenge 2009-2010 show 
that the hybrid strategy performs better than the state- 
of-the-art, obtaining solutions with less services and 
optimal QoS. 

7 Conclusions 

In this paper we have presented a hybrid algorithm to 
automatically build semantic input-output based com¬ 
positions minimizing the total number of services while 
guaranteeing the optimal QoS. The proposed approach 
combines a set of graph optimizations and a local- 
global search to extract the optimal composition from 
the graph. Results obtained with the Web Service Chal¬ 
lenge 2009-2010 datasets show that the combination of 
graph optimizations with a local-global search strategy 
performs better than the state-of-the-art, as it obtained 
solutions with less services and optimal QoS. Moreover, 
the evaluation with a set of randomly generated datasets 
shows that the hybrid strategy is well suited to perform 
compositions in diverse scenarios, as it can achieve a 
good tradeoff between quality and execution time. 

Appendix A 

Computational Complexity 

The calculation of the optimal QoS can be computed in 
pol 5 momial time for a given Service Match Graph using 
classical shortest path algorithms such as Dijkstra or 
Bellman-Ford. But, as stated in the introduction, there 
can exist multiple solutions with the same global QoS but 
different number of services. Thus, in many scenarios, 
optimizing the QoS objective function is not enough to 
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provide the best possible answer. However, it turns out 
that optimizing the number of services of a composifion 
is an infracfable problem. The nexf fheorem proves fhaf 
fhe Service Minimization Problem (SMP) is a NP-Hard 
combinaforial optimization problem. 

Theorem. Finding the minimum number of services whose 
outputs match a given set of unresolved (unmatched) concepts 
is a NP-Hard combinatorial optimization problem. 

Proof: We will show fhaf fhe Service Minimizafion 
Problem (SMP) is NP-Hard by proving fhaf fhe op- 
fimizafion version of fhe Sef Cover Problem (SCP), a 
well-known NP-Hard problem, is pol 5 momial-time Karp 
reducible fo SMP SCP <p SMP. The optimization 
version of fhe SCP problem is defined as follows: given 
a sef of elemenfs U = {ui,... ,Um} and a sef S of subsefs 
of U, find fhe smallesf sef (cover) C C S' of subsefs of S 
whose union is U. The decision version of fhis problem, 
sfafed as fhaf of deciding whefher exisfs a cover Cscp 
of size k or less (| Cscp I < k), is NP-Complefe. We will 
also consider fhe simplesf form of fhe SMP fhaf can 
be confained in a Service Match Graph, which is defined 
as follows: given a service wjj and a sef of candidafe 
services Ws = {wi,...,rc„} such fhaf ® Iwu f 0 
A • • • A Ow„ 0 Iwu 0' selecf fhe smallesf subsef of 
services from Ws such fhaf fhe union of fhe oufpufs of 
fhe services from Ws, Ows' safisfies Ows ® Iwu = Iwur 
i.e., fhe oufpufs of fhe services confained in Ws mafch all 
fhe inpufs of wjj. As in fhe SCP, fhe decision version of 
fhis optimization problem is defined as fhaf of deciding 
whefher exisfs a subsef of candidafe services Csmp of 
size k or less {\Csmp\ < k) such fhaf fhe union of fhe 
oufpufs of fhe services in Csmp mafch all fhe inpufs of 
wu- 

In order fo prove fhaf fhe SMP optimization problem is 
NP-Hard, we need fo demonsfrafe fhaf ifs corresponding 
decision problem is NP-Complefe. We will fherefore 
reduce fhe SCP problem by means of a function p fhaf 
fransforms any arbifrary insfance of fhe SCP info an 
insfance of fhe SMP in pol 5 rnomial fime. We have fo 
prove fhaf 1) (p{U, S) is a SMP problem; 2) p runs in 
pol 5 rnomial fime; and 3) fhere is a sef covering of p(17, S) 
of size k or less if and only if fhere is a sef covering of 
t/ in S' of size k or less. 

Given a pair {U, S), we define p(C/, S) = {wu, Ws) such 
fhaf: 

• Wu = {Iwu =U = {wi,..., M„}, 0}, where Ui is fhe 
ffh unresolved inpuf of wu- 

• Vsi = {ui ^,..., Ui^} £ S, 3wi G Ws such fhaf Wi = 
^0, Owi } and Ow, 0 Iwu — 

By fhis definifion, fhe p(?7, S) maps each element u G 
U to an input of fhe service wu- Each subsef Si G S is 
also mapped fo a service whose oufpufs mafch exacfly 
fhe inpufs of wu thaf correspond wifh fhe elemenfs of Si. 
This mapping can be compufed by adding a mafch from 
an arbifrary oufpuf of each service Wi G Ws to each input 
Ui G Si, which clearly runs in linear time in the size of 


U. Moreover, ip{U,S) is a Service Minimization Problem 
according to its definition. 

Now suppose there is a set covering jCI < fc, C C S' of 
U. Thus, Vu G U, 3ci G C such fhaf u G c. From fhe 
services {wu,Ws) consfrucfed from {U,S) by Lp{U,S), 
fhere exisfs Wi G Ws such thaf Ow, 0 Iwu = Ci C 7.^,^, 
and so {jfOwi 0 Iwu) — ^wu — i-s-' the oufpufs of 

fhe services from the set Ws of size k or less represent a 
cover of fhe Service Minimizafion Problem p(C/, S). 

□ 

Appendix B 

Algorithm Analysis and Discussion 

The proposed approach consisfs of a hybrid algorifhm 
fhaf opfimizes both the global QoS and selects the com¬ 
position with the minimum number of services that pre¬ 
serves the optimal QoS. As demonstrated in Appendix 
the problem of minimizing the number of services is 
NP-Hard. Thus, under the P f NP assumption, there is 
no pol 5 rnomial time algorithm that can exactly solve this 
optimization problem. However, although it is in general 
intractable, in practice many instances of the problem, 
as shown in the evaluation section, can be optimally 
solved in reasonable time. In those situations, it may 
be preferable to provide optimal solutions instead of 
just sub-optimal ones. Qur approach takes advantage 
of a hybrid strategy that combines a local search and a 
global search plus the use of preprocessing optimizations 
and search optimizations (minimum-remaining-values 
heuristic, cycle detection, QoS bounds propagation) in 
order to achieve a good trade-off between optimality of 
the solution and computation time. Here we analyze the 
complexity of the proposed techniques. 

B.1 Cycle detection 

The cycle detection is implemented as a Look-Ahead 
strategy, that traverses all the resolved matches, starting 
from the current service (the one selected to resolve 
a new unresolved input), until no more services are 
reachable. This strategy seeks to discover whether the 
current service is a valid candidate or not by checking 
if it can lead to a dependency cycle, so it can be pre¬ 
maturely discarded. Ilie cycle detection algorithm takes 
0(|y|-l-|£’|), since every service, input, output and match 
between inputs and outputs have to be traversed in 
worst-case. 

B.2 QoS Update 

The QoS update method calculates the optimal end-to- 
end QoS through the graph. This method is also used to 
recalculate optimal QoS bounds whenever a local QoS 
bound is excedeed. This problem can be modeled as a 
shortest path problem with generalized costs for QoS (as 
shown in Section 4.3) and solved using Dijkstra's algo¬ 
rithm. The worst-case time complexity of tbis algorithm 
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is as follows: given a Service Match Graph Gs = (1^, E), 
where Wr C 1^ is the set of services in the graph, there 
are at most \Wii\ calls to POP method to extract the 
lowest scored service from the queue. Since the queue 
is implemented as a binary heap, the POP and INSERT 
methods have a time of 0{log{n)), where n is the size 
of the queue. Thus, in the worst case, the running time 
is 0{\Wr\ ■ log{\WR\)), plus the (at most) \E\ updates 
of neighbor services that are reinserted into the queue. 
Therefore, the overall time is 0{{\E\ + \ Wr\) ■ log{Wii)). 


B.3 Local search 

This method performs a heuristically guided local search 
to minimize the number of services of the optimal end- 
to-end QoS composition. At each step, it selects the most 
promising candidate by selecting the one with fewer 
inputs that matches the largest number of unresolved 
inputs. If the algorithm gets stuck at some point, i.e., 
it reaches a point where no service can be selected 
without leading to a cyclic dependency, it backtracks to 
try the next most promising candidate service. The al¬ 
gorithm calls RANK-RESOLVERS to rank the candidates 
according to the number of unresolved inputs that each 
candidate can match and, in case of draw the service 
with less inputs is preferred. The sorting of services 
takes 0{n ■ log{n)) using merge sort, where n is the 
number of services. Each time a service is selected, the 
method RESOLVE creates an updated copy of the graph 
inO(|y| + |£;|). 

Assumming non-cyclic dependencies in the Service 
Match Graph, in the worst case the algorithm have to 
select all the services from the graph until no unresolved 
inputs are left. Thus, in the first step t|WH| the algorithm 
ranks all the \Wr\ services in 0(|Vkfl| • log{\WR\)), selects 
the first one and generates a new copy of the graph in 
0{\V\ -E |i?|). The rurming time of this step is 0{\Wr\ ■ 
log{\WR\) + 0{\V\ -h |£;|) = 0{\Wr\ ■ log{\WR\)). In the 
next step the algorithm ranks \ Wr\ — 1 services, 

selects the best one, creates a copy of the graph and so on. 
Therefore, the asymptotic upper bound of the running 
time of t\WR\ + ^Wr\-i + • ■ ■ + ti is 0(|Vkij| • log{\WR\)). 

In the absence of the assumption of non-cyclic depen¬ 
dencies, the asymptotic upper bound analysis shows that 
the time complexity grows exponentially with the depth 
of the search, since in the worst-case the algorithm fails 
(backtracks) at each step until the last combination of 
services is explored. However, in practice, this upper 
bound seems far from the average-case. As shown in 
the evaluation (Section 6), the growth of the time with 
respect to the size of the graph is closer to the best- 
case scenario, since an exponential number of backtracks 
due to cylic dependencies is extremely rare. In any case, 
the algorithm can be easily adapted to perform better 
in the worst-case scenario, for example by limiting the 
number of candidates to the top-K best services for each 
unresolved input. 



Fig. 8. Reduction of the ieft graph into the right graph by 
computing aii possibie combinations of services 

B.4 Global search 

The aim of the global search algorithm is to perform 
an exhaustive search to find the minimum combination 
of services that satisfy the composition request with 
optimal QoS. The algorithm explores every possible valid 
combination of services in a breadth-first fashion by 
resolving one input at a time. For each unresolved input 
with fc > 1 candidates, new k different states are created 
by calling the RESOLVE method and pushed to the 
queue for further expansion. In order to calculate an 
asymptotic upper bound for the time complexity, we can 
compute the number of combinations of services that the 
algorithm needs to extract from the queue in the worst- 
case. To this end, we first count the maximum number 
of combinations (solutions) that we can generate for a 
simple graph with fixed size and then we generalize the 
problem for a graph of arw size. 

Left graph from Figurel^shows an example of a Service 
Match Graph with 4 services (excluding Si and So). As 
can be seen. Si requires two inputs, 1 and 2. Qn the 
other hand, the outputs of A and B match the input 1 
whereas the outputs of services C and D match the input 
2. Therefore, in order to match both inputs, we can select 
services A and C, A and D, B and C or B and D {2x2 
combinations). By computing all possible combinations, 
we can reduce the graph from the left, where Si has two 
inputs, to the graph from the right, where Si has just one 
input. 

In general, given a service w with |/u,| = k inputs 
and Cl, C 2 ,..., Cfc set of candidate services for each input, 
there are nik*! combinations of services, i.e., we can 
replace the k inputs with k sets of candidate services 
by one input with |ci| candidates. Since each service 
can have in turn some inputs with other candidates, we 
can recursively replace each service with all the possible 
combinations of services that can be generated. This 
process leads to a flattening of the graph until there 
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is just one level with all the possible combinations of 
services (compositions) that can be generated for a given 
Service Match Graph. Thus, fhe problem of counfing fhe 
number of possible solufions in fhe worsf-case can be 
reduced fo fhe following: given a Service Match Graph 
with I Wr I services, what is the maximum of producfs of 
partifions of Wr? More formally, given a sef S' (|S'| > 1), 
choose n parfifions ci, C 2 ,..., c„ such fhaf \ci\ = |S| 
and \ci\ is maximized. For example, given 11 services, 
we can fake 3 groups of 3 services and one wifh fhe 
remaining 2 services, so fhe producf of fhe parfifion is 
3^ • 2 = 54, which is fhe maximum. Finding an upper 
bound for fhis value will gives us an upper bound for fhe 
maximum number of composifions fhaf can be enumer- 
afed in fhe worsf-case, i.e., for fhe mosf complex Service 
Match Graph fhaf can be generafed wifh \ Wr\ services. If 
can be proved fhat, for any sef of size n, fhe maximum 
can be obfained by parfifioning the set into groups of 
2 and 3 elemenfs, wifh no more fhan 2 groups of 2 
elemenfs. From fhis if follows fhaf fhe maximum producf 
is bounded by 3"/^, so we can conclude fhat 0(3"/^) is 
a tight asymptotic upper bound on the running time in 
the worst-case. 

However, it should be noted that although the cal¬ 
culation of an opfimal solufion for fhe problem in fhe 
worsf-case requires exponenfial time wifh fhe size of fhe 
graph, in pracfice, fhe number of services for a particular 
requesf is usually orders of magnifude lower fhaf fhe 
number of available services in fhe dafasef (see Table 2 
and 4). In addition fo fhis, fhe opfimizafions inf reduced 
in Section 5.3 plus fhe global QoS bound propagation, the 
minimum-remaining-values heuristic and cycle detection 
used in the global search are aimed to reduce further 
the size of fhe explored search space by decreasing fhe 
number of analyzed services. 


Appendix C 

Differences with previous work 

In pO) we presenfed an infegrafed approach for discov¬ 
ery and composition of semanfic Web services. However, 
fhe framework does nof include any of fhe novelties 
fhaf are presenfed in fhis approach. Qur previous work 
presenfs an infegrafed framework for aufomafic I/O 
driven discovery and composifion of semantic Web ser¬ 
vices and analyzes the impact of fhe discovery in fhe 
whole process, buf wifh no QoS supporf. In confrasf, in 
fhis work we presenf a hybrid composifion algorifhm 
fhaf optimizes bofh QoS and fhe number of services, 
which is a different and a harder problem. The main 
differences are: 

• The Service Model has been exfended fo give sup¬ 
porf for QoS properties. 

• The compufafion of fhe Service Match Graph for fhis 
problem is differenf. In fhis work, all fhe seman¬ 
tic matches between all the services in the graph 
are computed in order to be able to guarantee 


an optimal end-to-end QoS. However, in |30|, the 
Service Match Graph contains only the matches from 
the outputs of previous layers to the inputs of 
subsequent layers, i.e., the inputs of a service that 
appears in the fth layer can be matched only by 
the outputs of services that are in any jth layer 
where j G [0,z — 1]. This condition is enough to 
find the smallest composition (in terms of number 
of services and length of the composition) but it 
is not enough to guarantee the optimal QoS since 
there are missing relations that can be part of the 
optimal solution. 

Service Match Graph optimizations have been ex¬ 
tended to take into account QoS. Also, a new step 
in the optimization pipeline has been included to 
prune suboptimal QoS services (i.e., services that 
carmot be part of the optimal solution). 

The proposed composition algorithm is completely 
different. The algorithm from p0[ is focused on the 
minimization of Web services using an algo¬ 
rithm with admissible state-space pruning. How¬ 
ever, this technique is not enough to cope with the 
complexity of this new problem at large scale. Thus, 
we developed a new algorithm which consists of 
a hybrid strategy to optimize both global end-to- 
end QoS and the number of services, which is a 
different and also a harder problem. 
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